Skip to content

spawnSync: make signal-forwarding register/unregister safe for concurrent callers#30956

Open
robobun wants to merge 5 commits into
mainfrom
farm/a5eb0a5e/spawnsync-sigpwr
Open

spawnSync: make signal-forwarding register/unregister safe for concurrent callers#30956
robobun wants to merge 5 commits into
mainfrom
farm/a5eb0a5e/spawnsync-sigpwr

Conversation

@robobun
Copy link
Copy Markdown
Collaborator

@robobun robobun commented May 18, 2026

Fuzzer fingerprint: 11551e051c08db2c

What

  • Guard Bun__registerSignalsForForwarding() / Bun__unregisterSignalsForForwarding() with a mutex and depth counter so only the outermost register/unregister pair installs and restores process-wide signal handlers.
  • Drop SIGIOT and SIGPOLL from the forwarded signal list — they alias SIGABRT/SIGIO on Linux, so the second sigaction() call overwrote the saved previous handler with our own forwarding handler, and unregister then restored the wrong handler.
  • Add a Linux regression test that stresses many concurrent Bun.openInEditor calls (each runs spawnSync on a detached thread) against GC.

Why

The signal-forwarding helpers assume they run on the main thread only (previous_actions[] is an unsynchronized process-global), but Bun.openInEditor runs spawnSync on detached threads. Overlapping register/unregister calls corrupt the saved-handler table: a register can save our own forwarding handler as "previous", and an unregister after the table is zeroed installs SIG_DFL, silently dropping handlers for forwarded signals (e.g. the TTY-restoring SIGINT/SIGTERM handlers or user process.on(...) handlers).

The worst symptom of this race — losing JSC's SIGPWR GC suspend handler and dying with signal 30 — was fixed on main by removing SIGPWR from the list (#31183, same underlying fuzzer crash family). This PR hardens the remaining race for the other forwarded signals, which was also requested in review.

Test

test/js/bun/spawn/spawnSync-sigpwr-gc.test.ts (Linux-only): points editor auto-detection at a fake codesleep binary, fires 64 concurrent openInEditor calls so their spawnSync signal-forwarding windows overlap, then allocates and runs GC. Before the original SIGPWR fix this died with signal 30; it now also exercises the concurrent register/unregister path guarded here. Existing open-in-editor-gc.test.ts from #31183 still passes.

@robobun
Copy link
Copy Markdown
Collaborator Author

robobun commented May 18, 2026

Updated 1:48 PM PT - May 23rd, 2026

@robobun, your commit db5ce58 has 1 failures in Build #57350 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 30956

That installs a local version of the PR into your bun-30956 executable, so you can run:

bun-30956 --bun

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 18, 2026

Review Change Stack

Walkthrough

Removed SIGPWR from the Linux-only signal-forwarding macro used by spawnSync, added process-wide mutex and reference-counting around signal handler register/unregister, and added a Linux regression test that stresses overlapping spawns and GC to confirm the SIGPWR handler remains intact.

Changes

Signal Forwarding Fix

Layer / File(s) Summary
Mutex and depth counter for signal forwarding
src/jsc/bindings/c-bindings.cpp
Add a global std::mutex and signalForwardingDepth reference counter to coordinate process-wide signal-forwarding install/remove logic.
Remove SIGPWR from Linux-only signals
src/jsc/bindings/c-bindings.cpp
Updated FOR_EACH_LINUX_ONLY_SIGNAL to exclude SIGPWR, keeping SIGPOLL only.
Guard register/unregister with mutex and depth checks
src/jsc/bindings/c-bindings.cpp
Bun__registerSignalsForForwarding() and Bun__unregisterSignalsForForwarding() now lock the global mutex and use the depth counter to early-return on nested calls so only the outermost call updates signal dispositions.
Linux regression test for SIGPWR and GC stress
test/js/bun/spawn/spawnSync-sigpwr-gc.test.ts
Added a Linux-gated test that symlinks sleep as the editor, spawns overlapping Bun editor processes while forcing GC, and asserts the child exits normally (no signal termination, exit code 0) and stdout/stderr meet expectations.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title accurately describes the main change: adding thread-safety mechanisms (mutex and depth counter) to signal-forwarding register/unregister functions for concurrent callers.
Description check ✅ Passed The pull request description comprehensively covers what the changes do, why they are necessary, and how they were tested. Both required template sections are addressed with detailed information.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/jsc/bindings/c-bindings.cpp (1)

936-953: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Signal-forwarding globals are still unsafe under overlapping spawnSync calls.

Removing SIGPWR fixes this specific crash path, but the forwarding mechanism still relies on unsynchronized process-global state (previous_actions, Bun__currentSyncPID, Bun__pendingSignalToSend). With concurrent spawnSync windows, handlers/PID state can still be clobbered and restored incorrectly for other forwarded signals.

Also applies to: 915-916, 979-1015

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/jsc/bindings/c-bindings.cpp` around lines 936 - 953, The
signal-forwarding globals (previous_actions, Bun__currentSyncPID,
Bun__pendingSignalToSend) used by spawnSync are not safe for overlapping calls;
serialize access and make the handler logic robust: introduce a single shared
mutex (e.g., spawnSyncSignalMutex) and lock it around any code that
installs/restores signal handlers and mutates previous_actions,
Bun__currentSyncPID, and Bun__pendingSignalToSend inside the spawnSync
implementation and its helper functions, and change the signal handler to verify
the PID against an atomic Bun__currentSyncPID before acting (so stray signals
are ignored); this ensures handlers/state cannot be clobbered by concurrent
spawnSync calls and restores handlers only when the caller that set them is
still the owner.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/jsc/bindings/c-bindings.cpp`:
- Around line 936-953: The signal-forwarding globals (previous_actions,
Bun__currentSyncPID, Bun__pendingSignalToSend) used by spawnSync are not safe
for overlapping calls; serialize access and make the handler logic robust:
introduce a single shared mutex (e.g., spawnSyncSignalMutex) and lock it around
any code that installs/restores signal handlers and mutates previous_actions,
Bun__currentSyncPID, and Bun__pendingSignalToSend inside the spawnSync
implementation and its helper functions, and change the signal handler to verify
the PID against an atomic Bun__currentSyncPID before acting (so stray signals
are ignored); this ensures handlers/state cannot be clobbered by concurrent
spawnSync calls and restores handlers only when the caller that set them is
still the owner.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: b3b75814-6478-48d0-97e4-fbcdfd357797

📥 Commits

Reviewing files that changed from the base of the PR and between ad68af8 and 4b9d469.

📒 Files selected for processing (2)
  • src/jsc/bindings/c-bindings.cpp
  • test/js/bun/spawn/spawnSync-sigpwr-gc.test.ts

@github-actions
Copy link
Copy Markdown
Contributor

Found 1 issue this PR may fix:

  1. Flaky SIGILL crash in JSC garbage collector during bun test on Linux x64 (1.3.7 and 1.3.9) #27312 - Flaky SIGILL crash in JSC garbage collector during bun test on Linux x64; the intermittent GC crash in SlotVisitor::drain is consistent with a corrupted signal handler from the SIGPWR race condition this PR fixes

If this is helpful, copy the block below into the PR description to auto-close this issue on merge.

Fixes #27312

🤖 Generated with Claude Code

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional findings (outside current diff — PR may have been updated during review):

  • 🟡 test/js/bun/spawn/spawnSync-sigpwr-gc.test.ts:54 — nit: expect(stderr).not.toContain("AddressSanitizer") is the negative crash-marker pattern that root CLAUDE.md:126 says to NEVER write ("These tests will never fail in CI"). The positive assertions on signalCode, stdout === "ok", and exitCode === 0 already fail if the process dies via SIGPWR or ASAN aborts, so this line adds no signal — just drop it.

    Extended reasoning...

    What the issue is

    test/js/bun/spawn/spawnSync-sigpwr-gc.test.ts:54 adds:

    expect(stderr).not.toContain("AddressSanitizer");

    This is a negative assertion on a crash-marker string in subprocess output. The repo's root CLAUDE.md (line 126) explicitly forbids this pattern:

    NEVER write tests that check for no "panic" or "uncaught exception" or similar in the test output. These tests will never fail in CI.

    "AddressSanitizer" is exactly "or similar" — it's a crash-marker string just like "panic" or "uncaught exception".

    Why this assertion adds no signal — step by step

    Consider the two CI configurations:

    1. Non-ASAN build (the common case in CI). AddressSanitizer is not compiled in, so stderr can never contain "AddressSanitizer" regardless of whether the bug is present. The assertion is vacuously true and cannot fail — this is precisely what the CLAUDE.md rule warns about.

    2. ASAN build, bug regresses. If the SIGPWR handler gets clobbered and the process is terminated by signal 30, the child does not exit cleanly. In that case:

      • proc.signalCode is "SIGPWR", not null → line 53 expect(proc.signalCode).toBeNull() fails.
      • The child never reaches console.log("ok") → line 55 expect(stdout.trim()).toBe("ok") fails.
      • exitCode is non-zero / null → line 56 expect(exitCode).toBe(0) fails.

      And if ASAN itself reports a memory error and aborts, it does so with a non-zero exit code (and typically SIGABRT), so the same three positive assertions catch it before the stderr check matters.

    In every scenario where not.toContain("AddressSanitizer") would fail, at least one of the surrounding positive assertions has already failed. There is no execution path where this line is the sole failing assertion.

    Why the existing assertions are sufficient

    The test already follows the recommended structure: it asserts on signalCode === null, stdout.trim() === "ok", and exitCode === 0. These are positive assertions on observable process state and they fully cover the regression being tested (process terminated by SIGPWR during GC). The stderr check is pure belt-and-suspenders that violates a documented "NEVER" rule.

    Fix

    Just delete line 54. Replacing it with expect(stderr).toBe("") (the CLAUDE.md-preferred form) is also an option but may be too strict here — 64 concurrent openInEditor calls against a fake code binary could plausibly emit incidental noise on stderr without indicating failure, so dropping the line entirely is the safer choice.

    This is a test-hygiene/convention nit, not a functional bug in the SIGPWR fix itself.

Comment thread src/jsc/bindings/c-bindings.cpp
Comment thread src/jsc/bindings/c-bindings.cpp
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/js/bun/spawn/spawnSync-sigpwr-gc.test.ts`:
- Around line 41-49: Change the subprocess spawn options to pipe stderr (replace
stderr: "inherit" with stderr: "pipe"), capture proc.stderr.text() alongside
proc.stdout.text() and proc.exited (e.g. await Promise.all([... ,
proc.stderr.text(), proc.exited])), store that value in a stderr variable, and
insert the CI-friendly assertion if (exitCode !== 0) { expect(stderr).toBe("");
} immediately before expect(exitCode).toBe(0); keep the existing checks on
proc.signalCode and stdout.trim() as-is.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: bea85ffa-50b0-4816-90cc-048b3b916ad6

📥 Commits

Reviewing files that changed from the base of the PR and between ac02046 and 2d5b42e.

📒 Files selected for processing (1)
  • test/js/bun/spawn/spawnSync-sigpwr-gc.test.ts

Comment on lines +41 to +49
stdout: "pipe",
stderr: "inherit",
});

const [stdout, exitCode] = await Promise.all([proc.stdout.text(), proc.exited]);

expect(proc.signalCode).toBeNull();
expect(stdout.trim()).toBe("ok");
expect(exitCode).toBe(0);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Pipe stderr for better CI failure diagnostics.

With stderr: "inherit", you cannot programmatically assert stderr contents or surface them in the test diff on failure. Per repo conventions for subprocess tests using bunEnv, prefer piping stderr and asserting it's empty (after filtering ASAN noise if needed).

♻️ Suggested change
     stdout: "pipe",
-    stderr: "inherit",
+    stderr: "pipe",
   });

-  const [stdout, exitCode] = await Promise.all([proc.stdout.text(), proc.exited]);
+  const [stdout, stderr, exitCode] = await Promise.all([
+    proc.stdout.text(),
+    proc.stderr.text(),
+    proc.exited,
+  ]);

   expect(proc.signalCode).toBeNull();
   expect(stdout.trim()).toBe("ok");
+  if (exitCode !== 0) {
+    expect(stderr).toBe("");
+  }
   expect(exitCode).toBe(0);

Based on learnings: "In oven-sh/bun Jest/Bun test files under test/js/ that spawn subprocesses using bunEnv from the harness module, it's safe and intentional to assert expect(stderr).toBe("") unconditionally" and "write if (exitCode !== 0) { expect(stderr).toBe(""); } immediately before expect(exitCode).toBe(0)."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
stdout: "pipe",
stderr: "inherit",
});
const [stdout, exitCode] = await Promise.all([proc.stdout.text(), proc.exited]);
expect(proc.signalCode).toBeNull();
expect(stdout.trim()).toBe("ok");
expect(exitCode).toBe(0);
stdout: "pipe",
stderr: "pipe",
});
const [stdout, stderr, exitCode] = await Promise.all([
proc.stdout.text(),
proc.stderr.text(),
proc.exited,
]);
expect(proc.signalCode).toBeNull();
expect(stdout.trim()).toBe("ok");
if (exitCode !== 0) {
expect(stderr).toBe("");
}
expect(exitCode).toBe(0);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/js/bun/spawn/spawnSync-sigpwr-gc.test.ts` around lines 41 - 49, Change
the subprocess spawn options to pipe stderr (replace stderr: "inherit" with
stderr: "pipe"), capture proc.stderr.text() alongside proc.stdout.text() and
proc.exited (e.g. await Promise.all([... , proc.stderr.text(), proc.exited])),
store that value in a stderr variable, and insert the CI-friendly assertion if
(exitCode !== 0) { expect(stderr).toBe(""); } immediately before
expect(exitCode).toBe(0); keep the existing checks on proc.signalCode and
stdout.trim() as-is.

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — one-line removal of SIGPWR from the Linux forwarding list, clearly correct given JSC's use of SIGPWR for GC thread suspension.

Extended reasoning...

Overview

This PR removes M(SIGPWR) from the FOR_EACH_LINUX_ONLY_SIGNAL macro in src/jsc/bindings/c-bindings.cpp (one deleted line plus a 3-line explanatory comment) and adds a Linux-only regression test in test/js/bun/spawn/spawnSync-sigpwr-gc.test.ts that stresses concurrent openInEditorspawnSync calls against GC to verify the SIGPWR handler is no longer clobbered.

Security risks

None. The change is purely subtractive — it stops spawnSync from replacing the process-wide SIGPWR handler on Linux. There is no new attack surface, no input handling, and no auth/crypto/permissions involvement. The test only manipulates a temp dir, a symlink to sleep, and PATH for a child Bun process.

Level of scrutiny

Low-to-moderate. Signal-handler registration is process-global state, so changes here deserve a careful read, but this particular change is a strict reduction in scope: one fewer signal is touched. The rationale (JSC uses SIGPWR for GC thread suspend/resume in WTF/wtf/posix/ThreadingPOSIX.cpp) is correct and well-documented, and the npm list this was copied from targets V8 where SIGPWR is unused — so dropping it has no user-visible downside while fixing a real fuzzer-found crash.

Other factors

My two earlier inline comments were explicitly flagged as pre-existing issues (SIGPOLL/SIGIOT aliasing, and the stale main-thread-only comment / broader previous_actions[] race). The author verified the aliasing observation and reasonably scoped both out to a follow-up to keep this PR focused on the fuzzer crash; neither blocks this change. CodeRabbit's stderr-piping suggestion is a test-diagnostics nit, not a correctness issue. No bugs were found by the bug-hunting system on this revision. CI had failures on an earlier commit and a follow-up commit (test: drop negative crash-marker assertion) was pushed; the source change itself is sound regardless.

@robobun
Copy link
Copy Markdown
Collaborator Author

robobun commented May 18, 2026

CI status: 68/69 checks pass. The one failure is darwin-aarch64-build-cpp with ninja: build stopped: interrupted by user at step 1093/1186 (compiling boringssl/.../bcm.cc.o) — build agent was interrupted mid-compile, not a code error. This PR's only source change is inside #if OS(LINUX) and isn't compiled on macOS; the same job passed on the previous build (55633).

The new spawnSync-sigpwr-gc.test.ts passed on all Linux targets including the previously-flaky ubuntu-25.04-aarch64.

Already re-rolled once for unrelated flakes on build 55633 (node-http-backpressure SIGKILL, bun-upgrade GitHubIsDown). Not re-rolling again; this just needs the macOS build job retried.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/jsc/bindings/c-bindings.cpp (1)

1016-1022: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Only clear the forwarded PID on the outermost unregister.

Line 1018 resets Bun__currentSyncPID before the depth check. If two spawnSync calls overlap, the first inner unregister leaves signalForwardingDepth > 0 but disables forwarding for the still-active outer call, so a later signal in that window never reaches the child.

Suggested fix
 extern "C" void Bun__unregisterSignalsForForwarding()
 {
-    Bun__currentSyncPID = 0;
-
     std::lock_guard<std::mutex> lock(signalForwardingLock);
     if (--signalForwardingDepth != 0)
         return;
+
+    Bun__currentSyncPID = 0;
 
 `#define` UNREGISTER_SIGNAL(SIG)                                \
     if (sigaction(SIG, &previous_actions[SIG], NULL) == -1) { \
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/jsc/bindings/c-bindings.cpp` around lines 1016 - 1022, The code resets
Bun__currentSyncPID before checking signalForwardingDepth, which clears the
forwarded PID prematurely for nested spawnSync calls; modify
Bun__unregisterSignalsForForwarding so it acquires signalForwardingLock,
decrements signalForwardingDepth, and only when signalForwardingDepth reaches 0
does it clear Bun__currentSyncPID and return—leave Bun__currentSyncPID unchanged
when depth stays > 0 to ensure outer calls still forward signals.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/jsc/bindings/c-bindings.cpp`:
- Around line 1016-1022: The code resets Bun__currentSyncPID before checking
signalForwardingDepth, which clears the forwarded PID prematurely for nested
spawnSync calls; modify Bun__unregisterSignalsForForwarding so it acquires
signalForwardingLock, decrements signalForwardingDepth, and only when
signalForwardingDepth reaches 0 does it clear Bun__currentSyncPID and
return—leave Bun__currentSyncPID unchanged when depth stays > 0 to ensure outer
calls still forward signals.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 465df2fa-68f7-42bb-8adf-a435cfd0251b

📥 Commits

Reviewing files that changed from the base of the PR and between 2d5b42e and 7e2c283.

📒 Files selected for processing (1)
  • src/jsc/bindings/c-bindings.cpp

@robobun
Copy link
Copy Markdown
Collaborator Author

robobun commented May 18, 2026

CI on 7e2c283 (the reentrancy fix): 73/74 checks pass. The one failure is darwin-14-aarch64-test-bun with status Expired — the job timed out waiting for an agent and never ran. All 15 other test-bun jobs passed, including darwin-26-aarch64 (same arch, newer macOS) and every Linux target where the new spawnSync-sigpwr-gc test runs.

No test failures on this build; only auto-retried flakes in the warning annotations (hot.test.ts, spawn-stdin-readable-stream.test.ts, spawn.test.ts timeout — all on Windows and all pre-existing).

Already used my one retrigger on build 55649; not pushing another. The expired darwin-14-aarch64 job just needs a retry.

@robobun
Copy link
Copy Markdown
Collaborator Author

robobun commented May 20, 2026

Also fixes fuzzer fingerprint 66c688589d402c9f (#31131, closed as duplicate).

robobun and others added 5 commits May 23, 2026 16:05
JSC uses SIGPWR for GC thread suspend/resume on Linux. spawnSync's
signal-forwarding helper installed a SA_RESETHAND handler for SIGPWR
and saved the previous handler into a process-global array with no
synchronization. Bun.openInEditor spawns a detached thread per call
that runs spawnSync; concurrent calls raced on the previous-handler
table and left SIGPWR at SIG_DFL, so the next GC suspend terminated
the process with signal 30.
Bun.openInEditor spawns detached threads that call spawnSync, so
register/unregister can run concurrently. Guard previous_actions[]
with a mutex and depth counter so only the outermost pair installs
and restores handlers. Also drop SIGIOT/SIGPOLL from the list since
they alias SIGABRT/SIGIO and the second sigaction() was overwriting
the saved handler with our own.
@robobun robobun force-pushed the farm/a5eb0a5e/spawnsync-sigpwr branch from 7e2c283 to db5ce58 Compare May 23, 2026 16:16
@robobun robobun changed the title spawnSync: do not forward SIGPWR on Linux spawnSync: make signal-forwarding register/unregister safe for concurrent callers May 23, 2026
Comment on lines 1019 to +1023
Bun__currentSyncPID = 0;

std::lock_guard<std::mutex> lock(signalForwardingLock);
if (--signalForwardingDepth != 0)
return;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Minor: Bun__currentSyncPID = 0; runs before the new lock + --signalForwardingDepth != 0 early-return, so an inner unregister (e.g. an openInEditor thread finishing while a main-thread spawnSync is still waiting) zeroes the outer caller's forwarding PID even though the depth guard is meant to make inner calls no-ops. You've already noted Bun__currentSyncPID remains a shared singleton, and moving this line below the guard trades a zeroed PID for a potentially stale (reaped) one in the opposite interleaving — so probably worth a one-line comment on why the reset is intentionally outside the guard rather than a code change.

Extended reasoning...

What this is

Bun__unregisterSignalsForForwarding() now does:

Bun__currentSyncPID = 0;                              // line 1019 — unconditional
std::lock_guard<std::mutex> lock(signalForwardingLock);
if (--signalForwardingDepth != 0)
    return;                                           // inner caller: no-op

The depth counter added in db5ce58 is meant to make nested register/unregister pairs no-ops so only the outermost pair touches process-wide state. But the PID reset on line 1019 sits above both the lock and the depth check, so every caller — inner or outer — zeroes Bun__currentSyncPID.

Concrete walk-through (the ordering where this bites)

  1. Bun.openInEditor detached thread: Bun__currentSyncPID.store(0) (process.rs:3138) → register() (depth 0→1, installs handlers) → spawn editor → store(editor_pid) → block in waitpid.
  2. Main thread Bun.spawnSync: store(0)register() (depth 1→2, early-return) → spawn child → store(child_pid) (process.rs:3152) → block in waitpid.
  3. Editor exits; detached thread's SignalForwarding guard drops → Bun__unregisterSignalsForForwarding()line 1019 sets Bun__currentSyncPID = 0 → depth 2→1 ≠ 0 → return.
  4. User presses Ctrl-C. The forwarding lambda sees Bun__currentSyncPID == 0 and stashes the signal in Bun__pendingSignalToSend instead of kill(child_pid, SIGINT). With SA_RESETHAND, the disposition is now SIG_DFL, so a second Ctrl-C kills bun without the child ever receiving the signal.

If line 1019 were below the depth check, step 3 would leave child_pid in place and step 4 would forward correctly.

Why this isn't already prevented

The new comment at lines 912-916 says "only the outermost pair touches process-wide signal dispositions" — and that's accurate for the sigaction calls and previous_actions[]. But Bun__currentSyncPID is also process-wide state read by the forwarding lambda, and it's reset outside the guard. Nothing else protects it: both Rust (process.rs:3138/3152) and Zig (process.zig:2396/2409) callers write it directly without checking depth.

Addressing the counter-argument

There's a reasonable case that the current placement is intentional. In the opposite interleaving (main-thread spawnSync registers first, openInEditor registers second), the inner caller has already overwritten the PID with 0 then editor_pid on its register path — so by the time the inner unregister runs, the outer PID is gone regardless. Moving line 1019 below the guard would then leave Bun__currentSyncPID == editor_pid (a freshly-reaped PID), and a subsequent signal would kill() a PID the kernel may have recycled. Zeroing it (current behavior) is the safer failure mode for that ordering: the signal is stashed rather than mis-delivered.

So neither placement is correct for all interleavings — that's the acknowledged singleton limitation, and a real fix needs per-caller PID tracking (out of scope here). The point of this comment is narrower: the placement looks like an oversight relative to the depth guard added two lines below it, and there is at least one realistic ordering (steps 1-4 above) where it discards a still-valid PID that moving it would preserve.

Suggested action

Given the stale-PID tradeoff, I'd lean toward leaving the code as-is and adding a one-line comment above line 1019 noting that the reset is deliberately outside the depth guard (zero is safer than a possibly-reaped inner PID). Alternatively, drop line 1019 entirely — every caller already does store(0) immediately before register() (process.rs:3138, process.zig:2396), so the outermost unregister doesn't need it, and removing it fixes the step-3 ordering above without introducing the stale-PID case.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes — the reset is intentionally outside the depth guard. With overlapping callers the PID slot is already a last-writer-wins singleton, so by the time an inner unregister runs, the slot usually holds the inner (now-reaped) child anyway. Zeroing it makes the handler fall back to the pending-signal path instead of kill()ing a possibly-recycled PID, which is the safer failure mode. A real fix is per-caller PID tracking, which is out of scope here. Leaving the code as-is; happy to add the one-line comment if a maintainer prefers it inline.

@robobun
Copy link
Copy Markdown
Collaborator Author

robobun commented May 23, 2026

CI on db5ce58 (rebased onto main): 67 pass, 3 fail, 4 pending.

The failing jobs (debian-13-x64, debian-13-x64-baseline, plus the umbrella) are all the same test: test/regression/issue/8254.test.ts killed by SIGKILL ("main process killed by SIGKILL but no core file found"). That test allocates a ~2GB blob and writes a 2GB file — no spawn, no signals — and the identical failure (same test, same 4 Linux x64 job types) appears on build 57343 from unrelated PR #30671, so it's an OOM-kill flake on those agents, not something introduced here.

The changes in this PR (signal-forwarding mutex/depth guard, SIGIOT/SIGPOLL alias removal, regression test) passed everywhere they run, including debian-13-x64-asan, all aarch64 Linux jobs, and macOS/Windows.

Already used my one retrigger earlier; not pushing another. The 8254 OOM flake needs either a job retry or a fix to that test's memory footprint (separate issue).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants